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MULTIOBJECTIVE OPTIMIZATION : AN INERTIAL DYNAMICAL 
APPROACH TO PARETO OPTIMA 

HEDY ATTOUCH & GUILLAUME GARRIGOS 


Abstract. We present some first results concerning a gradient-based dynamic approach to multi¬ 
objective optimization problems, involving inertial effects. We prove the existence of global solution 
trajectories for this second-order differential equation, and their convergence to weak Pareto points 
in the convex case. It is a first step towards the design of fast numerical methods for multi-objective 
optimization. 


1. Introduction 

We propose a first study of an inertial gradient-based dynamical system for multi-objective 
optimization. In a Hilbert space setting given objective functions ^ M, i = 1,2, ...,g, 
which are continuously differentiable, we consider the Inertial Multi-Objective Gradient system 

(IMOG) mu{t) + '^u{t) -|- coVfi{u{t))^ = 0. 

It is a second-order in time differential equation, where the mass m and the viscous damping 
coefficient 7 are fixed positive parameters, and coV fi{u{t))^ is the element of minimal norm of the 
convex hull of the gradients of the objective functions at u{t). This dynamical system combines 
both aspects, inertial and multi-objective. Each of them has been the subject of active research, 
but to our knowledge the combination of both aspects has not been considered before. Let us 
review some important facts concerning each of these aspects separately. 

a) When neglecting the acceleration term, we recover the first-order system 

(MOG) u{t) + coV fi{u{t)f = 0, 

which has been first considered by Henry [27], Gornet [18]-[20] in economics, as a dynamical mech¬ 
anism of ressource allocation. It was then developed independently as an optimization tool by 
Miglierina [30], Brown and Smith [15], Attouch and Goudou [7]. Its extension to the nonsmooth 
setting has been recently considered in Attouch, Garrigos and Goudou [6]. As a main property of 
(MOG), along its trajectories all the objective functions are decreasing. In the quasi-convex case, 
the trajectories converge as f —)• -|-oo to Pareto optima. 

Various first-order algorithms for multi-objective optimization can be considered as the time dis¬ 
cretization of this dynamic; let us refer to the seminal work of Fliege and Svaiter [25], followed by 
[13, 21, 22, 23, 31] (and the references therein). Newton-based methods have been considered by 
Drummond, Fliege and Svaiter [24], see also the multiobjective BFGS method of Povalej [34]. 

b) When considering a single criteria /, we recover the so-called Heavy Ball with Friction dynamic 

(HBF) mu{t) +'ju{t)+ 'Vf{u{t)) = 0. 

This system has a clear mechanical interpretation. Just like a heavy ball sliding down the graph of 
/, due to the viscous friction effect, each trajectory tends to stabilize at a local minimum of /. As 
an optimization tool, this system was first considered by Polyack [33], Antipin [4], and Attouch- 
Goudou-Redont [8]. The convergence property of the trajectories has been proven in the two basic 
situations, in the case / real analytic by Haraux and Jendoubi [26], and in the convex case by 
Alvarez [1]. The basic motivation for considering second-order in time systems is that, intuitively 
inertia provides fast methods. The study of fast gradient-based methods for solving optimization 
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problems is an active area of research. While first-order methods are well-understood and relatively 
easy to implement, they generally suffer from slow convergence rates (linear). In contrast, second- 
order methods usually enjoy fast convergence properties, as super-linear or quadratic. 

Generally speaking, there are two ways to incorporate second-order information in the dynamics, 
or the algorithms. A second-order analysis in space (using the Hessian of the objective function, 
or some approximation), leads to Newton-like methods. The other approach, which is our main 
concern, consists in using a second-order information in time, by introducing in the dynamics the 
second-order derivative of the trajectory. Such inertial methods are easier to implement than the 
Newton-like methods, but their analysis is quite delicate. Following Nesterov and Giiler seminal 
work, a very popular method is the FISTA algorithm which has been developed by Beck and 
Teboulle [12], and which is an inertial version of the classical forward-backward algorithm. As a 
remarkable property of this algorithm, the convergence rate of the values is 0{^). Recently Su, 
Boyd and Candes [37] showed that this algorithm can be interpreted as a discrete version of the 
second-order differential equation, in the case a = 3, 

cx 

u{t) + + ^/(^(^)) = 0- 

In the above equation, the viscous coefficient j tends to zero as t —)• -|-oo, which makes the 
inertia effect more effective asymptotically than with a fixed positive viscous coefficient, a key for 
fast methods (see Gabot, Engler and Gaddat [16] for a general view on the asymptotic vanishing 
damping effect). Gonvergence of the trajectories of the above system has been obtained in the 
case a > 3 by Attouch-Peypouquet-Redont [10] (continuous dynamic) and Ghambolle-Dossal [17] 
(discrete algorithmic case). 

As a general rule, comparison of the corresponding continuous and discrete dynamics is of in¬ 
terest, since they usually share asymptotically the same qualitative and quantitative properties: 
convergence to a critical point of the objective function, with similar convergence rate. For a 
rigorous approach of this comparison, see the works of Alvarez-Peypouquet [2, 3], and Peypouquet- 
Sorin [32]. Moreover the continuous dynamic is often easier to treat mathematically than the 
corresponding algorithms, since we can use the flexibility of the differential and integral calculus. 
Quite often Lyapunov functions are first discovered in the continuous case, and then transposed to 
the algorithms. 

Thus our program consists in studying the Inertial Multi-Objective Gradient system, by combin¬ 
ing the technics which have been described above. In section 2, we recall briefly some aspects of the 
multiobjective optimization, and in particular the first-order steepest descent method evoked above. 
Then we introduce our dynamic, namely the Inertial MultiObjective Gradient (IMOG) system. In 
section 3, we investigate the existence of solution trajectories of (IMOG) in finite dimensions. In 
section 4, we study the properties of the trajectories generated by (IMOG). Under a convexity 
assumption on the objective functions, we show that the bounded trajectories converge to weak 
Pareto points of the problem. Of course, due to the effects of inertia, (IMOG) is not a descent 
dynamic, i.e. the values of the cost functions may not decrease over time. But we show that, 
with an appropriate choice of the initial velocity, the cost values are improved along the trajectory 
relative to the starting point. 

2. Pareto Optimality and Multi-objective steepest descent direction 

Let fi : T-i — > M (i G {l,...,g}) be a finite family of real-valued functions. We suppose in 
this paper that they are continuously differentiable, with gradients being Lipschitz continuous on 
bounded sets. Note F : % —)■ the vector-valued function defined by F{u) := (/i(ji))ie{i,...,ij}) 
and consider the associated vector optimization problem 

(P) MINF(n). 

u£H 
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Let us precise the notions of solutions we consider for (P). 

Take the canonical order < on M”? defined by 

( 1 ) a<b4^ yie Ui^bi, 

which induces a strict order a Sb ^ a <b and a ^ b. Then, we say that n G P is a Pareto efficient 
point (or Pareto optimum) of (P) whenever the sublevel set {u G P | F{v) F{u)} is empty. In 
other words, Pareto optimum are points having the property that none of the objective functions 
can be improved in value, without degrading some of the other objective values. We can also equip 
M”? with a weaker strict order <, defined by 

(2) a < 6 Vi G {1,..., q}, Oj < bi. 

To this weaker strict order corresponds a weaker notion of Pareto efficiency : we say that u G P 
is a weak Pareto efficient point (or weak Pareto optimum) of (P), if {u G P | F{v) < F{u)} is 
empty. It is clear that any Pareto optimum is in particular a weak Pareto optimum. Note that in 
the case of a single objective function / (i.e. q = 1), Pareto and weak Pareto optima coincide with 
the notion of global minimizer of /. 

Still in this mono-criteria case, a known necessary condition for rt G P to be a minimizer is 
V/(ii) = 0. A generalization of this Fermat’s rule holds for Pareto optima : 

Proposition 2.1. (Fermat’s rule) If u is a weak Pareto point of (P), then 0 G coV/i(n). 

By analogy with the mono-criteria case, we say that tt is a critical Pareto point whenever 0 G 
coV/j(tt). This notion has been considered by Smale in [36], Cornet in [18], see [11] for recent 
account of this notion, and various extensions of it. As we can expect, this first-order necessary 
optimality condition for local multi-objective optimization becomes sufficient in the convex setting 
(see for instance [6, Lemma 1.3]) ; 

Proposition 2.2. If each objective function (/i)je{i,.,.,q} is convex, then Pareto critical points 
coincide with weak Pareto points. If the functions are strictly convex, then the same holds for 
Pareto optima. 

These last properties justifies the search for critical Pareto points, just as critical points are looked 
for in the single objective case. 

We introduce now the steepest descent vector field : 

(3) s-.n ^ n 

u I—)■ s{u) := —coV/i(u)° 

where coV/j(tt)® denotes the element of minimal norm of the convex compact set coV/i(u). The 
vector s{u) is called the multi-objective steepest descent direction at u, and simply reduces to 
—Vf{u) if g = 1. It enjoys the following nice properties, which extends known facts about —V/(u) 
in the mono-criteria case : 

i) s{u) = 0 if and only if u is Pareto critical. 

ii) s{u) is a common descent direction at u for all the objective functions. More exactly, 

(4) \fi = {l,...,q}, {Vfi{u),s{u)) ^ -\\s{u)f. 

iii) It is the steepest common descent direction, in the sense that 
51 'li) 

^ = argmin max {Vfi{u),d) whenever s{u) / 0. 

Il■s(^i)ll d&n *6{l,...,g} 


( 5 ) 
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Item i) is immediate since s{u) is defined as the element of minimal norm of —coV/i(rt). Item ii) 
comes immediately once seeing that s{u) is the projection of the origin onto —coVfi{u). Item hi) 
makes use of duality arguments, see for instance [18, Proposition 3.1] or [ 6 , Theorem 1.8] for more 
details on the proof. 

Because of its properties, it seems natural to consider the dynamic governed by the steepest 
descent vector field s : T-i —)■ T-L, namely the MultiObjective Gradient system : 

( 6 ) (MOG) u{t) — s{u{t)) = 0, t £ [0, +oo[. 

Indeed, at each t, u{t) is a common descent direction for the objective functions, so they should 
all decrease along the trajectories. Also, equilibrium points of the dynamic are exactly critical 
Pareto points. Note that the idea of constructing dynamic having these properties goes back to 
Smale in [36]. This dynamic has been recently studied in [7, 6 ], where the authors prove the 
cooperative nature of this dynamic, that is the common decrease of the objective functions along 
the trajectories. Moreover, the trajectories are proved to weakly converge to Pareto critical points, 
in the convex case. This steepest descent dynamic can be seen as the continuous version of the 
gradient method introduced by Fliege and Svaiter in [25], which share the same qualitative and 
asymptotic behavior with (MOG). 

Our purpose is to consider a modified version of this steepest descent dynamic, by introducing 
inertial effects. Take m, 7 > 0, and consider the following Inertial MultiObjective Gradient dynamic: 

(7) (IMOG) mu{t) + 'yu{t) — s{u{t)) = 0, t G [0, +oo[. 

From a physical point of view, the parameter m can be interpreted as the mass of the physical 
point u{t), on which acts the sum of two forces : the friction —ju{t) and the vector field s{u{t)). 
One can see, at least formally, that we recover the first-order steepest descent dynamic (SD) by 
letting the mass m go to zero. Moreover, it reduces to the classical Heavy Ball with Friction (see 

( 8 ) ) when q = I, that is ; 

( 8 ) (HBF) mu{t) +'ju{t) I-Vf{u{t)) = 0, tG[0,-|-oo[. 

As we will see in section 4, (IMOG) dynamic shares similar properties with (HBF). 


3. Existence of trajectories for (IMOG) 


3.1. Existence of trajectories. In this section, we question the existence of solutions for the 
Cauchy problem associated to (IMOG). Let to £ R, T G]to,+oo], and {uo,uo) £ V?. We say that 
u : [to,T[ —> H is a solution of (IMOG) if u is continuous on [tO)F[, of class on ]tO)F[, and 
satisfies 


(IMOG) 


mu{t) = —'yu{t) -|- s{u{t)) for all t G]to,F[, 
uito) = uq, u{to) = uq. 


In view to apply an existence theorem for the dynamical system (IMOG), the key point is the 
regularity of the steepest descent vector field s. We recall the following result from [7] ; 


Proposition 3.1. Recalling that the gradients V/j : H ^ H are Lipschitz continuous on bounded 
sets, we have that s is ^-Holder continuous on bounded sets. 

This result is nearly optimal since there exists simple situations for which s is not a locally Lipschitz 
continuous vector field (see [6, Example 2]). Hence, there is no hope to apply Cauchy-Lipschitz’s 
theorem to get existence and uniqueness of the trajectories. We will use instead Peano’s existence 
result (see for instance [5, Theorem 2.8]) : 
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Theorem 3.2. (Peano) Let (p : M”" —)• ML be eontinuous. Then, for all xq £ T-L and Iq G M, there 
exists some T > 0 and x : [io, to + T [— > TL of class C^, such that 

(9) x{t) = (p{x{t)) for all t G [to, to + T[, with x{to) = xo- 

As one knows, Peano’s result asks less regularity but applies only in finite dimension. Moreover, 
contrary to the Cauchy-Lipschitz theorem, uniqueness is not guaranteed here (it will be discussed 
later). The ingredients are now all gathered to get a first local existence result : 

Proposition 3.3. Suppose that LL has finite dimension. For all to G M, for all (uo,uo} £TL xTL, 
there exists some T > 0 and u : [to, to + T[—> TL of class , such that 

(10) mu{t) = —'yu{t) + s{u{t)) for all t G [to, to + T[, with n(to) = uo, u{to) = uo- 

Proof. We just need to apply a change of variables in (IMOG) to get a first-order ODE. Let 

he defined by 

(11) 4>(u, v) := — (mv, —'jv + s(u)). 

m 

Clearly, from Proposition 3.1, (p is continuous on TL^. We can then apply Peano’s Theorem at to 
and Xo := (uo,uo), to get some x : [to, to + T[ —of class such that (9) holds. If we note 
x(t) = (u(t),n(t)) £TL xTL, (9) can be rewritten as : 

(12) u{t) = v{t),mv{t) = —7u(t) -|- s{u{t)) for all t G [to, to + T[, with u{to) = uo, v{to) = uo- 

Since x is of class C^, we deduce that it is also the case for u and v. But from u{t) = v{t), we can 
see that u is of class and satisfies (10). □ 

Remark 3.4. For this result we use Theorem 3.2, which asks the space to be finite dimensional. 
In fact, Peano’s theorem can be stated in the Banach space setting, if one asks the vector field 
involved to be compact. We recall that an application (p : TL —)• TL is said to be compact whenever 
it is continuous and maps bounded sets to relatively compact sets. Observe that if the gradients 
V/j are all compact, then s is also compact. Hence, one might want to apply Peano’s result in this 
context. Nevertheless, by reducing (IMOG) to a first-order ODE, we do not deal directly with s 
but with [u, v) I—)• ^[mv, —'~fv + s{u)). And it can be easily proved that if s is compact, then v v 
is also compact, which would mean that TL has finite dimension. 

We can now state our main existence result. To get a global solution on [0, -|-oo[, we do a stronger 
hypothesis on the gradients. 

Theorem 3.5. Suppose that TL has finite dimension, and that the gradients Vfi are globally Lip- 
schitz continuous. Then, for all to G M, {uo,uo) G TL xTL, there exists u : [to,+oo[ —> TL of class 
such that 

(13) mu{t) = — 7tt(t) -|- s{u{t)) for all t G [to, +c>o[, with u{to) = uq, u(to) = ho. 

Proof of Theorem 3.5. Proposition 3.3 provides us a local solution and, using Zorn’s lemma, we 
can suppose that it is a maximal solution u : [to,T[—> TL, with T G [to, -|-oo]. The whole point is 
to prove that T = -|-oo. For this, we argue by contradiction by supposing that T < -l-oo. We will 
show that the solution does not blow up in finite time, and extend it at T to obtain a contradiction. 

Using the fact that the gradients are globally Lipschitz continuous, we can derive the following 
global growth property for s : 

(14) 3c > 0 s.t. Vu G TL, ||s(u)|| ^ c(l -|- ||u||). 

Indeed, for all u £ TL, there exists 9{u) £ 5'^ such that 

<? <? 

s{u)\\ = \\Y,0i{u)VMu)\\ ^ j;0,(u)||V/,(u) - V/,(0)|| + ||V/,(0)||. 

i=l i=l 


( 15 ) 
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Then (14) holds by taking for instance ci = max {{{Vfi{0)\\, Lip(y fiiH)}. 

From this growth condition, we will obtain some energy estimates on the trajectory. Let us show 
that u and il lie in L°°(tQ,T]T-L). For this, we consider as before 


(16) 


: —)• (u, v) !->• 4>{u, v) = —(mv, —jv + s{u)). 

m 


By defining x{t) := {u{t),u{t)) for all t £ [tQ,T[, we see that x{t) = (p{x(t)) on [to,T[. Define 
h{t) := ||x(t) — x(to)|| on which is continuous on Equip with the scalar product 

inherited from Ti, and note that /i^ is derivable on [to,T[, so we can write for all t G [to,T[ : 


(17) 


= {x{t),x{t) - x{to)) = {^{x{t)),x{t) - x{to)) ^ \\(l){x{t))\\h{t). 


From the growth condition (14) we deduce an upper bound for ||^(x(t))||. Indeed, for all x = 
(u, v) G 77^, 

||i;i)(x)|| ^ + ||■s('u) — T^’ll) using the equivalence between and norms 

^ (1 + + ^(1 + 11^*11) using the triangle inequality with (14) 


< 


€ 2(1 + ||x||) with C 2 := \/ 2 max{ —; 1 + — 

m m 


Using the triangle inequality with C 3 := 02(1 + ||x(to)||), it follows for all t G [to,T[ that 

(18) ||0(x(t))|| ^ C3(l + /l(t)). 

Combining (17) and (18), we obtain 

(19) 


^ C 3 /i(t)(l + h{t)) for all t G [to,T[. 


We will now conclude by using a Gronwall-type argument. Consider an arbitrary £ G]0,T — to[- 
After integration of (19) on [to,T — s], and using h{to) = 0, we obtain 

(20) 2“ 

^ Jto 

Since h is continuous on [0,T —e], the function g : t £ [0,r —e] 1 —)• C 3 (l + /i(t)) is in L^([0,T —e],] 
Hence we can apply Lemma A.2 (we left it in the Appendix) to obtain 

( 21 ) 


1 

^ / cs{l + h{s))h{s) ds for all t £ [to, T — e], 
2 Jtn 


h{t) ^ f 03(1 + h{s)) ds for all t £ [to,T — e]. 
Jto 

We easily obtain from (21) and T < +00 that 


( 22 ) 


h{t) ^ C 3 T + C 3 / h{s) ds for all t £ [to,T — e], 
Jto 


so we can use the Gronwall-Bellman’s Lemma (see Lemma A.l in the Appendix), and obtain : 
(23) h{t) ^ ^ for all t £ [to, T - e]. 


Since the upper bound in (23) is independent of e and t, we deduce that h £ L“([0,r],M). 

From the definition of h, we obtain that u and n lie in L“(0,T;?7). Moreover, using the growth 
condition (14), we see that sou £ L°°(0,T;7t), so u{t) = ^{s{u{t))—'yu{t)) lies also in L°°{0,T;'H). 
Now, since T is supposed finite, we can say that L°°{0,T]'H) C L^{0,T;7i), so u can be extended 

continuously at T by u{T) := u(0) + / u{t)dt, and we can do the same for ii. Hence, we can 


Jo 

apply Proposition 3.3 at to = E with {uo,uo) = {u{T),u{T)) to extend the solution u(-), which 
contradicts its maximality. □ 
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As already observed in [6], the steepest descent vector field governing the dynamic is not Lips- 
chitz, neither monotone (even in the convex setting). So we cannot use methods from monotone 
operator theory, and the question of uniqueness of the trajectories remains open in the general 
context. Nevertheless, under some assumptions, we still can ensure the uniqueness. 

Proposition 3.6. Let n be a trajectory solution of the Cauchy problem (13). Suppose that for all 
t G [to,+oo[, (V/j(u(t)))j=i_,,,^g are linearly independent vectors. Then u is the unique solution to 
(13). When q = 2, the same conclusion holds, just assuming that V/i(n(t)) / V/2(u(t)). 

Proof. It is proved in [6, Proposition 3.4] that under these hypotheses, for all t G [fo,+oo[, the 
steepest descent vector field is locally Lipschitz in the neighbourhood of u{t). Hence, it suffices to 
apply the Cauchy-Lipschitz theorem instead of Peano’s to derive the uniqueness of u. □ 

3.2. Examples. 

Example 3.7. Take the quadratic functions/i(x, y) = and f 2 {x,y) = \{x—l)‘^+\y‘^. 

The corresponding Pareto set is [—1, +1] x {0} and the steepest descent vector field is given by : 


(24) 


s{x,y) 


-{x -l,y) if x> 1, 

< -(0,y) if - 1 ^ X ^ 1, 
— (x + l,y) if X < —1. 


Figure 1 shows some trajectories of the (IMOG) dynamic, with the steepest descent vector field 
plotted in background. We used the following parameters : m = 1, and (ug, ug) are taken randomly. 
Here the trajectories are computed exactly, since in this simple example (IMOG) can be solved 
explicitely. For each trajectory, the initial point is indicated by the symbol x, and the limit point 
by ©. We can observe the following : the trajectories all converge to a Pareto point, the dynamic 
is clearly not a descent method, and can be highly oscillating whenever the friction parameter is 
too close to zero. 



Figure 1. Friction parameter 7 
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Example 3.8. Let fi{x,y) = + 2/^) and f 2 {x,y) = x. The corresponding Pareto set is 

] — oo, 0] X {0}, plotted in blue in Figure 2. Once computed, we see that the steepest descent vector 
field is defined according to three areas of the plane (these areas are delimited by red lines in blue 
in Figure 2) : 


— (1,0) if X ^ 1, 

(25) s{x,y) = l-ix,y) if {x - + y'^ I, 

^^3^^(y2 ,y(l-x)) else. 

In this case we plotted the trajectories using an explicit discretization in time of (IMOG) : 

Un+l T Ufi—l ^n+1 '^n 


m- 


+ 7- 


+ s{Un) = 0 


47 


Un+1 — Uji Un—l) 


We took m = 1,7 = 1 and t = 0.05, and again the initial point for each trajectory is indicated by 
the symbol x, and the limit point by ©. 



Figure 2 
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4. Properties of the solutions of (IMOG) 

For a given function (j) : T-i —)■ % and a nonempty subset A G T-L, note Lip{(l);A) the best 
Lipschitz constant of (j) over A, that is Lip{(l);A) := sup MM—ga,y that (j) is Lipschitz 

x^y&A 

over A whenever Lip{(j)] A) < +oo. 

4.1. A dissipative system. We start our study of the dynamic by showing that it is a dissipative 
system. But before, we need the following chain rule: 

Lemma 4.1. (Chain rule) Let cj) : % — )■ M and u : I —)• Ti, where / is a non-empty open subset 
of M. Suppose that <p and u are of class on /, and that Lip{W (f)] u{I)) < -|-oo. Then for a.e. 
t (z I, 

(26) ^{4>ou){t) ^ Lip{V4);u{I))\\u{t)f + {V(j){u{t)), u{t)). 

Proof. By hypothesis, ii and Vipou are locally Lipschitz continuous, hence differentiable almost 
everywhere. So, from 'u){t) = {Vcfo u{t),u{t)), we have for a.e. t G I 

(P d 

(27) ^ {V(j) o u{t),u{t)). 

Moreover, we have in the second member (using the Cauchy-Schwarz inequality and the Lipschitz 
property of Vi^) : 

{ — (V(f)o u){t), u{t)) = lim o u{t + h) — V(j) o u{t), u{t)) 

dji h ^0 

^ lirn |^||V(/> o u{t + h) — V(j) o u(t)||||t6(t)|| 

^ i™^IFlll^(M/i)-u(t)||||u(f)|| = L\\u{t)\\^ 

where L := Lip{V(f; u{I)). 

□ 


Let us prove now the dissipativity of our dynamic : 

Proposition 4.2. (Dissipative property) Let u : [to,T [—)■ 7^ be a solution of (IMOG). For all 
i G {1,..., (7}, define for all t G [to, T[ : 

(28) £i{t) := {fi o u){t) + —{fi o u)'(t) + m||u(t)|p. 

7 

Then, for a.e. t G [tQ,T[, if L* := Lip(Vfi;u{[to,T[)) < -|-oo, we have 

(29) £’'(t) ^ —||u(t)f - - ( 7 ^ - mLi) ||u(t)f 

'y 'y 

Proof. The dissipative property is a direct consequence of the variational characterisation of the 
projection of 0 over co{V/i(u(t))} in (IMOG). Indeed, for a.e. t G [to, T[, we have —mu{t)—'yu{t) = 
projco{V/i(«(i))}(0)- follows that 

(30) {mu{t) -I- 7t6(t), V/j(u(t)) -I- mu{t) + 'yu{t)) ^ 0, 


which is equivalent, after distributing the terms and dividing by 7, to 
( 31 ) ^(V/i(u(t)),'u(t)) ^[{fi o u) + m\\u\\‘^]{t) ^ -l^||ii(t)|| 



2 


2 
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Use now Lemma 4.1 with Lip^Vfi] u{[to,T[)) < +oo to obtain 

^ ° ) + ^[{fi °u)+ m\\uf]{t) ^ -^||u(t)f - 'y\\u{t)f, 

which ends the proof. □ 

Proposition 4.2 suggests that we need an hypothesis on the parameters to ensure the dissipative 
property : 

(HP). 7 ^ > mLi where Li := Lip{Vfi-,u{[to,T[)). 

If (HP)^ holds for all i G {l,...,g} then we just write (HP). This hypothesis asks the friction 
parameter 7 to be large enough, in order to limit the inertial effects, which induce oscillations 
(see Example REF). The hypothesis asks also implicitly the gradients V/j to be Lipschitz over the 
trajectory (since 7 G M). Note that this last property holds whenever tt(-) is bounded, since the 
gradients are Lipschitz continuous on bounded sets (see Corollary 4.4). 

As a direct consequence of the dissipative nature of the system, we obtain that the values {fi{u{t)) 
are bounded from above by max{/j(uo); Tj(to)} : 

Corollary 4.3. (Upper bound for the values) Let u : [to,T [—)• be a solution of (IMOG), such 

that (HP)^ holds. Then, for all i G {1, ...,<?} and t ^ to, we have the following upper bounds : 

(32) fi{u{t)) ^ £i{to) + {fi{uo) - Tj(to))e"-^*"*°^ 

Proof. It is a trivial consequence of the monotonicity property of £i obtained in Proposition 4.2. 
Indeed, we obtain for all t G [to, + 00 [ : 

777 

(33) — (/i o n)'(t) ^ £i{to) - ifi o u){t). 

7 

The conclusion follows Gronwall’s Lemma, applied to 11 —)• {fi o u){t) — £i{to). □ 

This upper bound for the values has two interesting consequences. The hrst one is immediate, 
and gives a useful sufficient condition for the trajectory u{-) to be bounded: 

Corollary 4.4. Suppose that there exists i G { 1 ,...,( 7 } such that fi is coercive, and globally Lj- 
Lipschitz continuous, with 7 ^ > mLi. Then any trajectory of (IMOG) is bounded. 

The second consequence is that it tells us how to enforce the interesting property /*(«(•)) ^ fiiuo). 
Indeed, we know that this dynamic is not a descent method for the functions because of the inertial 
effects which can create damped oscillations. But at least, one can choose appropriately the initial 
velocity so that each point on the trajectory is better than the initial one. 

Corollary 4.5. Suppose that (HP), holds for some i G {l,...,^}. For all uq G LI, if uq G Ti is 
chosen to satisfy 

(34) fiiuo), Uq) ^ -7||uo|P, 

then fiiu{t)) ^ fiiuo) for all t ^ to- In particular, for all A G [0, 4], tto = As(uo) satisfies^ (34). 

Proof of Corollary f.S. We see in Corollary 4.3 that the conclusion holds whenever fi{uo) — £i{to) ^ 
0. This condition, once rewritten, is exactly (34). Now, consider mq = As(uo) for some A G [0, 4], 
We recall that this steepest descent direction satisfies for all i G {1, ...,g}, see (4), 

(35) l|s(«o)|P + fiiuo), s{uo)) ^ 0. 

So it follows easily that (34) holds for As(mo) • D 

^Observe that the set of vectors satisfying this property recalls the notion of pseudo-gradient introduced by 
Miglierina [30] 
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4.2. Energy estimations. We will now use the dissipative property of the system to deduce 
energy estimations for a global solution of (IMOG). 

Proposition 4.6. (Energy estimations) Let u : [to,+oo[ —)• % he, a. bounded global solution of 
(IMOG) satisfying (HP). Then, 

(i) For all i G {1,..., o'}, £i{t) | G M whenever t —)• +oo. 

(ii) ft G L“(to, Too; "H) n L^(to, +oo; "W) and lim ||tt(t)||=0. 

t—>■+00 

(hi) ft G L°°(to,+ 00 ;Tf) n L^(to,+ 00 ;"W) and liminfess ||ft(t)|| =0. 

t—^~l~oo 

(iv) For all f G {1,..., ( 7 }, (/j o rt)' G L°°(to,+ 00 ; M) and lim (/* o rt)'(t) = 0. 

t—>+00 

(v) For all f G {1,..., ( 7 }, (/j o rt) G L“(to) + 00 ; M) and lim {fio u){t) = £f° . 

t ^+00 

(vi) For all i G {1,..., ( 7 }, there exists Oi G L°°(to, + 00 ; M) such that for all t G [to, T[i 

Q 

mu{t) + ju{t) + 0iit)Vfi{u{t)) = 0 with 6{t) G S'^. 
i=l 

In particular, it follows that ^i(')(/i o u)' G L^{to,+oo;'H). 

i=l 

Proof. We start by proving that ft G + 00 ; 7i), from which the other results will follow easily. 

Take any i G {!,... ,g}, and define c := inf fi{u{t)) — £i{to) and M := max sup ||V/i(rt(t))||. 

Given that the gradients Vfi are Lipschitz continuous on bounded sets, we deduce (using the mean 
value theorem) that the functions fi are bounded on bounded sets. Since the trajectory is bounded, 
it follows that M and c are finite. In particular it implies that mu + yfc G +oo;7i), since, 

according to (IMOG), we have for a.e. t ^ to that —mu{t) — G co{V/j(n(t))} which is 

bounded by M. 

Using the monotonicity property of £i (see Proposition 4.2), we have for all t ^ Iq: 

(36) 0 ^ £i{t) - £i{to) ^ m||ft(t)|p + —{fi o u)'{t) + c. 

7 

Using Gauchy-Schwarz inequality and the definition of M, one has 

(37) {fiOu)'{t) = {Vfi{u{t)),u{t)) ^ -||V/i(u(t))||||ft(t)|| ^ -M||h(t)||. 

If we note b = — M, we obtain 

(38) 0 ^ m||ft(t)|p — 6 ||ft(t)|| + c. 

If we consider now the real polynomial mX‘^ — bX + c with m > 0, we can see that it takes 
negative values on a compact interval, independent of t. Since ||ft(t)|| lies therein, we conclude that 
ft G L°°{to, +oo;'H). 

We can now derive the other properties, and we start with (i). The decreasing property of the 
energies £i (see Proposition 4.2) ensures the existence of a limit £f°, taking eventually the value 
— 00 . But now we can prove that for all i G {1, •..,Q'}, £f° G M. Indeed, using the same inequality 
as in (37), 

Tfl 

(39) £^ = lim £i{t) ^ inf fi{u{t)) -M||ft||ioo(t > - 00 . 

t—>+00 t^to y , / 

We now prove (iii). Since mu + ^u and ft lie in L°°{to, + 00 ;7f), we directly obtain from m > 0 
that fi G L°°(to, + 00 ;7f). For the estimation, use Proposition 4.2 to obtain: 

f +00 l^+oo j 

(40) — / ||ft(t)||^dt^ / -—£i{f) dt =£i{to)-£°°. 

7 Jto Jto dt 
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It follows that u £ L‘^{to,+oo]'H), and then, hminfess||-u(t)|| = 0. 

t^+oo 

Let us now to prove (ii). Using exactly the same argument as for il, one obtains u £ +oo H). 

Moreover, we know that u is Lipschitz continuous on [to,+oo[ (since il £ L°^{to,+oo]'H)), so it 
follows that lim ||'u(t)|| = 0. 

t — 

We continue with items (iv) and (v). From Cauchy-Schwarz inequality, {(fiou)'{t)\ ^ M||u(t)|| for 
alH ^ to- As a direct consequence of (ii), we deduce (/jOii)'G (to,+oo; and lim (/jott)'(t) = 

t—>-+00 

0. Then it follows directly from (i) that lim (/* o u){t) = £f°, and (/* ou) £ L°°(to, +oo;^). 

<—>• + 00 

We end the proof with item (vi). It is clear from the definition of (IMOG) that for all t ^ to, 

<? 

there exists 6{t) = £ 5'^ such that mu{t) + ^u{t) + Y2 ^*(^)V/*(^(^)) = 0- To get 

i=l 

Oi £ L°°(to, +oo; M), the whole point is to verify that it can be taken measurable. For this, we write 
9{t) as a solution of the following optimality problem 

<? 

(41) e{t) £ aigmm. j{t, 9), where j(t, 0) := || ^ 6'iV/i(u(t))||. 

Since j is a Caratheodory integrand, we are guaranteed of the existence of a measurable selection 

9 : t >-£■ 9(t) £ argmin j{t,9) (see [35, Propositions 14.6, 14.32 and 14.37]). Now we can write 
eeS‘1 

<? <? 

(42) ^6'i(t)(/i ou)'(t) = ^9i{t){Vfi{u{t)),u{t)) = {-mu{t) - 7 t 6 (t), t6(t)) 

i=l i=l 

where u,u £ L^(to,+oo; 7t). So, using the Cauchy-Schwarz inequality and the measurability of 9i, 
<1 

we get directly that ^i(')(/j ° ^ T^(to, +oo;7^). □ 

i=l 

4.3. Convergence of the trajectories of (IMOG). We present here the main result of this 
section. Under a convexity assumption, we show that the bounded trajectories of (IMOG) weakly 
converge to a solution. 

Theorem 4.7. Suppose that the objective functions fi are convex. Then any bounded trajectory of 
(IMOG) u : [to,+oo[—)• Ti satisfying (HP) converges weakly to a weak Pareto optimum. 

We sketch here the main points of the proof. The convergence essentially relies on Opial’s Lemma 
that we recall below (note H[ii(t)] the set of weak sequential cluster points of the trajectory) : 

Lemma 4.8. (Opial) Let S' be a non empty subset of P, and u : [to, +oo[—)■ P. Assume that 

(i) H[tt(t)] C S; 

(ii) for every z £ S, lim \\u{t) — z\\ exists. 

<^+oo 

Then u{t) weakly converges to some element u°° £ S. 

It is applied to the set 

S := {x £ P \ fi{x) ^ lim fi{u{f)) for all i £ {1, ...q} }, 

t^+oo 

for which (i) is easy to obtain. The key point to prove the Fejer property (ii) is that h{f) := 
^||«(t) — satisfies a differential inequality. Indeed we have the following result from [8, Lemma 
4.2] or [9, Lemma 2.3] : 

Lemma 4.9. Let h £ C^(to, +oo; M) be a positive function satisfying mh + ^h ^ g where m, 7 > 0 
and S'G L^(to,+ 00 ;M). Then lim /i(t) exists. 

<—> + CXD 
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Once obtained the weak convergence of the trajectory, the characterisation of its limit point as a 
weak Pareto point is a direct consequence of the demiclosedness property of rt =1 co{V/j(?x)} (see 
for example [6, Lemma 2.4]) : 

Lemma 4.10. If tt* G co{V/i(rt„)} with lim rt* = 0 and w — lim = rtoo, then 0 G 

n^+oo n^+oo 

Co{V/i(Moo)}- 

Proof of Theorem Since u is bounded, there exists some tn —> +oo such that u{tn) converges 
weakly to some u°°. For all i G {1, ...g}, since fi is convex continuous it is in particular weakly 
semi-continuous. Hence, using Proposition 4.6 we get 

(43) /i(u“) ^ liminf/i(u(t„)) = lim fi{u{t)). 

n—>■- 1-00 >-|-oo 

This proves that H[u(t)] <Z S To obtain convergence of the trajectory through Opial’s Lemma, 
it remains to prove the Fejer property (ii). That is, given some z G S, prove that lim ||u(t) — z\\ 

t—>-|-oo 

exists. 

Define h{t) := 5 ||u(t) — z\\^ for all t ^ 0. Since u is absolutely continuous, then h is twice 
differentiable for a.e. t G [0, +oo[, and 

(44) h{t) = {u{t),u{t) - z), 

(45) h{t) = {u{t),u{t) - z) + \\u{t)\\‘^. 

A linear combination of (44) and (45) gives 

(46) mh{t) + 7 A(t) = m||u(t)|p + {—mu{t) — 'yu{t),z — u{t)). 

q 

Let 9i{t) G be such that —mu{t) — 'yu{t) = y) 6i{t)Vfi{u{t)), then we can rewrite 

i=l 

Q 

(47) mh{t) + jh{t) = m\\u{t)\\‘^ + ^ 6»i(t)(V/j(u(t)), z - u{t)). 

i=l 

For any i G {1,..., g}, we use the monotone property of Si and z G S (recall that S°° = lim fi{u{t))) 

t—>-|-oo 

together with the convexity of fi, to obtain for all t G [0, + 00 [ : 

(48) Si{t) = fi{u{t))+ —{fiOuy{t)+ m\\u{t)\\‘^ 

7 

^ > fi{z) > + {Vfi{u{t)), z - u{t)). 

Thus, it follows from (47) and (48) that 

<? 

(49) mh{t) +^h{t) ^ 2m\\u{t)\y‘ + — V'6'i(t)(/j o'u)'(t), 

where the right member of (49) lies in L^(to, + 00 ;"H) (see Proposition 4.6). 

Thus, hypothesis of Lemma 4.9 is satished, and lim h{t) exists. It follows from Opial’s 

t—>-|-oo 

Lemma that u{t) weakly converges to some G S. It remains to prove that is a weak 
Pareto. In (IMOG), we have —mu{t) — ^u(t) G co{Vfi{u{t))}, where w — lim u(t) = u^o 

t—>-|-oo 

and liminfess \\mu{t) + 'yu{t)\\ = 0 (see Proposition 4.6). Then we can apply Lemma 4.10 to 

t—>-+oo 

get 0 G co{V/j(ti“)}. Following Proposition 2.2, this is equivalent for Uoo to be a weak Pareto 
point. □ 

Remark 4.11. If the objective functions are not convex, we still can say something on the limits 
points: each weak limit point of a bounded trajectory of (IMOG) is a critical Pareto point (see 
Proposition 4.6 and 4.10). 
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5. Conclusion 

We presented an inertial continuous dynamic for multi-objective optimization, namely (IMOG). 
We have shown the existence of global trajectories for (IMOG), and their asymptotic convergence in 
the convex case to weak Pareto points. The general problem of the uniqueness of these trajectories 
remains open, as for the first-order dynamic (see [7, 6 ]). Our study was motivated by the fact that 
inertial methods usually produce trajectories that converge more quickly than first-order methods. 
It would now be interesting to study the rate of convergence of the trajectories of (IMOG) to weak 
Pareto points. Given the recent results of [16, 37], it seems natural to consider a modified version 
of (IMOG), allowing the viscosity parameter 7 to be time-dependent. In particular, a dependence 
of the type 'y{t) = j should open the road to FISTA-like algorithms for the resolution of multi¬ 
objective optimization problems. Of course, these questions are out of the scope of this paper, and 
should be treated in a future work. 


Appendix A. 

We give here the two integral forms of Gronwall’s Lemma that we used in the proof of Theorem 
3.5. They can be found in Brezis’s book [14, Lemma A. 4 Sz Lemma A.5, pp. 156-157]. 

Lemma A.l (Gronwall-Bellman). Let to € M and T G]to,+oo[. Let a G [0,-|-oo[, and g G 
L^([0,T],M) with g{t) ^ 0 for a.e. t G [0,T]. Let h G C'([0,T],M) such that 

(50) h{t)^a+ f g{s)h{s) ds for all t G [foj T]. 

Jto 


Then h{t) ^ for all t G [to,T]. 

Lemma A. 2. Let to G R. and T G]t 0 )+oo[- Let a G [0,-|-oo[, and g G L^([0,T],M) with g{t) ^ 0 
for a.e. t G [0,T]. Let h G C'([0,T],M) such that 

(51) ^ 17 + / fi'('S)^('S) ds for all t G [0,T], 

2 2 Jo 

then \h{t)\ ^ a-|- g{s) ds for all t G [0, T]. 
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